Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | Go Back
View Post [edit]

Poster: swiftlet Date: Nov 8, 2009 6:11pm
Forum: texts Subject: how to index archive.org text?

I have a ebook meta search web site,
we are thinking to index archive.org 1.7m free ebooks for our users. Any idea what's the best way to do it without adding too much server load to archive.org?

I don't think it's a good idea to crawl archive.org page by page. If there any robot friendly page or, even better, compressed catalog from archive.org? something like Project Gutenberg http://www.gutenberg.org/wiki/Gutenberg:Feeds

basically the format I need is quite simple, "Title", "Author", "Link" and "Format"(pdf, epub, html ...)

Thanks!

Reply to this post
Reply [edit]

Poster: garthus Date: Nov 8, 2009 6:30pm
Forum: texts Subject: Re: how to index archive.org text?

Swiflet;

What is the link to your web site?

Gerry

Reply to this post
Reply [edit]

Poster: swiftlet Date: Nov 8, 2009 7:59pm
Forum: texts Subject: Re: how to index archive.org text?

http://ebooks.addall.com/

you can turn on the advanced options or the advanced search page to see our searched site list,
http://ebooks.addall.com/?showadvanced

Reply to this post
Reply [edit]

Poster: Bede Date: Nov 22, 2009 12:57pm
Forum: texts Subject: Re: how to index archive.org text?

Is AddALL a commercial site?

Reply to this post
Reply [edit]

Poster: swiftlet Date: Nov 22, 2009 1:15pm
Forum: texts Subject: Re: how to index archive.org text?

does it matter?
what's the definition of "commercial site"?

Reply to this post
Reply [edit]

Poster: Time Traveller Date: Nov 23, 2009 8:42pm
Forum: texts Subject: Re: how to index archive.org text?

commonly the difference is seen in the fees charged.

if a fee for services rendered, or goods supplied just covers costs, example, the cost of a blank DVD, the time and resources in burning data to the DVD, and postage and packing, then that DVD would be from a non-commercial supplier.

Apply the same rule elsewhere.

But in the case of indexing the IA by another website,to allow its users to search for IA texts, the only fee that can be charged before said website becomes commercial, is just the cost of resource used, to store said index, allow search, and serving back the hits.

There can not be any charge at all, for the items being downloaded from the IA if said item, is discovered via this 3rd party website index.

The users of this 3rd party website, are not allowed to hide the fact, that the actual texts their members find using the 3rd party index, is coming from the IA, neither can they hide the fact, that the IA catalogue can be searched for free, by going to www.archive.org.

One more definition, is if the 3rd party website intends to only recover basic costs, or make a income and profit for its owners.

in my opinion though, the intend of the IA archive is clear in the Terms of Use, no indexing whatsoever by 3rd party websites, but there is no restriction on distributing the URL www.archive.org with a description, on the WWW, left, right, centre and upside down.

otherwise contact IA management.

Reply to this post
Reply [edit]

Poster: swiftlet Date: Nov 25, 2009 6:38am
Forum: texts Subject: Re: how to index archive.org text?

thanks for your reply.
since our users are free to use the search engine, free to directly link to IA after the search, and we do not serve any banner or adsense ads on the search result page, I guess our site is non-commercial.

Terms of Use (10 Mar 2001)